Fast thread communication and synchronization mechanisms for a scalable single chip multiprocessor

نویسنده

  • Stephen W. Keckler
چکیده

Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Smaller feature sizes have decreased the transistor switching time but at the same time increased the resistance of interconnect wires, resulting in slower signal transmission in on-chip wiring. Since future chips will have more silicon area and include more execution units, a much larger demand for parallelism is emerging. However, the increased signi cance of wire delay will require monolithic components, such as processors and caches, to be small and that the communication wires connecting them be short. Computer systems typically exploit concurrency using either instruction level parallelism (ILP) or coarse-grain parallel threads running on a multiprocessor. This thesis proposes mechanisms for exploiting on-chip parallelism at a ne grain to bridge the gap between ILP and coarse-grain multiprocessing. Fast interprocessor communication and synchronization enables the use of tasks with run lengths as small as 10 cycles. At the same time, these interaction mechanisms are less susceptible than conventional microprocessor designs to longer wire delays imminent in future silicon process technologies. As ne-grain parallelism is orthogonal to ILP and coarse-grain threads, it complements both methods and provides an opportunity for greater speedup. This thesis presents the architecture and implementation of the MIT Multi-ALU Processor (MAP), a 5 million transistor custom VLSI microprocessor chip. The MAP architecture incorporates 9 function units, split into 3 independent processors. The processors communicate via interprocessor register writes and synchronize using a hardware barrier instruction. These integrated mechanisms allow threads to communicate 10 times faster and synchronize 60 times faster than using a shared on-chip cache. The fast interprocessor interaction enables the MAP to exploit both instruction-level parallelism and ne-grain thread level parallelism. On a suite of applications, speedups of 1.2{2.4 are achieved using ne-grain threads on a 3-processor MAP chip. Thesis Supervisor: Dr. William J. Dally Title: Professor of Electrical Engineering and Computer Science

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An On-Chip Multiprocessor Architecture with a Non-Blocking Synchronization Mechanism

tive to superscalar architectures [5][8][12][13]. Strengths of an on-chip MP architecture are threefold. First, an MP can exploit different level parallelism, thread-level parallelism (TLP), in addition to ILP. Second, the complexity can be suppressed using simple processors. This ensures a high clock rate. Third, communication latency can be significantly reduced using an on-chip network. Thes...

متن کامل

Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

Thread-level speculation (TLS) makes it possible to parallelize general purpose C programs. This paper proposes software and hardware mechanisms that support speculative thread-level execution on a single-chip multiprocessor. A detailed analysis of programs using the TLS execution model shows a bound on the performance of a TLS machine that is promising. In particular, TLS makes it feasible to ...

متن کامل

A Flexible, Efficient Concurrent Garbage Collector for Speculative Thread Processors

Michael Chen and Kunle Olukotun Computer Systems Lab, Stanford University Abstract In this paper, we introduce a novel garbage collector for Java to be used for processors with speculative threads support like the Hydra chip multiprocessor (CMP). Thread speculation permits parallel execution of sections of sequential code with data dependencies enforced in the hardware, eliminating the need for...

متن کامل

Compiler Optimization of Value Communication for Thread-Level Speculation

In the context of Thread-Level Speculation (TLS), inter-thread value communication is the key to efficient parallel execution. From the compiler’s perspective, TLS supports two forms of inter-thread value communication: speculation and synchronization. Speculation allows for maximum parallel overlap when it succeeds, but becomes costly when it fails. Synchronization, on the other hand, introduc...

متن کامل

A design study of the EARTH multiprocessor

Multithreaded node architectures have been proposed for future multiprocessor systems. However, some open issues remain: can eecient multithreading support be provided in a multiprocessor machine such that it is capable of tolerating synchronization and communication latencies, with little intrusion on the performance of sequentially-executed code? Also, how much (quantitatively) does such non-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998